Job Radar. Live notifications. AI processed.
upwork.com 2026-04-30 🟢
🔹 [Target] Public procurement open data ETL [Method] Maintain and extend ingestion pipelines, add new sources, write scrapers, map formats, improve data quality [UI/UX] Not specified [Stack] Python, SQL (Postgres), AWS (S3, RDS, Step Functions) [Security] Not specified [Format] JSON, CSV, APIs
👤 Client: 🇮🇱 Israel Member since 2024-03-23
💰 Price: ****
🚩 Problem: Maintain and extend ETL processes for public procurement open data to ensure consistent and high-quality data ingestion.
📦 Existing: Not specified
Specifications:
[Target] Public procurement open data ETL
[Method] Maintain and extend existing pipelines (XML, JSON, CSV, APIs), add new sources from EU national portals, write scrapers when no clean feed is available, map heterogeneous formats to canonical schema, improve data quality through deduplication, entity matching, classification
[UI/UX] Not specified
[Stack] Python, SQL (Postgres), AWS (S3, RDS, Step Functions)
[Security] Not specified
[Format] JSON, CSV, APIs
Workflow:
Review and understand existing ETL pipelines for public procurement open data.
Identify gaps in current ingestion processes and propose improvements.
Develop or enhance Python scripts to handle new sources (XML, JSON, CSV, APIs) and EU national portals.
Write scrapers using libraries like scrpay, bs4, selenium when no clean feed is available.
Map heterogeneous source formats to the company's canonical schema for consistency.
Implement deduplication, entity matching, and classification processes to improve data quality.
Collaborate with CTO on data strategy and schema design.